181 research outputs found

    Transcript profiling in Candida albicans reveals new cellular functions for the transcriptional repressors CaTup1, CaMig1 and CaNrg1.

    Get PDF
    The pathogenic fungus, Candida albicans contains homologues of the transcriptional repressors ScTup1, ScMig1 and ScNrg1 found in budding yeast. In Saccharomyces cerevisiae, ScMig1 targets the ScTup1/ScSsn6 complex to the promoters of glucose repressed genes to repress their transcription. ScNrg1 is thought to act in a similar manner at other promoters. We have examined the roles of their homologues in C. albicans by transcript profiling with an array containing 2002 genes, representing about one quarter of the predicted number of open reading frames (ORFs) in C. albicans. The data revealed that CaNrg1 and CaTup1 regulate a different set of C. albicans genes from CaMig1 and CaTup1. This is consistent with the idea that CaMig1 and CaNrg1 target the CaTup1 repressor to specific subsets of C. albicans genes. However, CaMig1 and CaNrg1 repress other C. albicans genes in a CaTup1-independent fashion. The targets of CaMig1 and CaNrg1 repression, and phenotypic analyses of nrg1/nrg1 and mig1/mig1 mutants, indicate that these factors play differential roles in the regulation of metabolism, cellular morphogenesis and stress responses. Hence, the data provide important information both about the modes of action of these transcriptional regulators and their cellular roles. The transcript profiling data are available at http://www.pasteur.fr/recherche/unites/RIF/transcriptdata/

    Proteome sequence features carry signatures of the environmental niche of prokaryotes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Prokaryotic environmental adaptations occur at different levels within cells to ensure the preservation of genome integrity, proper protein folding and function as well as membrane fluidity. Although specific composition and structure of cellular components suitable for the variety of extreme conditions has already been postulated, a systematic study describing such adaptations has not yet been performed. We therefore explored whether the environmental niche of a prokaryote could be deduced from the sequence of its proteome. Finally, we aimed at finding the precise differences between proteome sequences of prokaryotes from different environments.</p> <p>Results</p> <p>We analyzed the proteomes of 192 prokaryotes from different habitats. We collected detailed information about the optimal growth conditions of each microorganism. Furthermore, we selected 42 physico-chemical properties of amino acids and computed their values for each proteome. Further, on the same set of features we applied two fundamentally different machine learning methods, Support Vector Machines and Random Forests, to successfully classify between bacteria and archaea, halophiles and non-halophiles, as well as mesophiles, thermophiles and mesothermophiles. Finally, we performed feature selection by using Random Forests.</p> <p>Conclusions</p> <p>To our knowledge, this is the first time that three different classification cases (domain of life, halophilicity and thermophilicity) of proteome adaptation are successfully performed with the same set of 42 features. The characteristic features of a specific adaptation constitute a signature that may help understanding the mechanisms of adaptation to extreme environments.</p

    A probabilistic model for gene content evolution with duplication, loss, and horizontal transfer

    Full text link
    We introduce a Markov model for the evolution of a gene family along a phylogeny. The model includes parameters for the rates of horizontal gene transfer, gene duplication, and gene loss, in addition to branch lengths in the phylogeny. The likelihood for the changes in the size of a gene family across different organisms can be calculated in O(N+hM^2) time and O(N+M^2) space, where N is the number of organisms, hh is the height of the phylogeny, and M is the sum of family sizes. We apply the model to the evolution of gene content in Preoteobacteria using the gene families in the COG (Clusters of Orthologous Groups) database

    Phylogeny of Prokaryotes and Chloroplasts Revealed by a Simple Composition Approach on All Protein Sequences from Complete Genomes Without Sequence Alignment

    Get PDF
    The complete genomes of living organisms have provided much information on their phylogenetic relationships. Similarly, the complete genomes of chloroplasts have helped to resolve the evolution of this organelle in photosynthetic eukaryotes. In this paper we propose an alternative method of phylogenetic analysis using compositional statistics for all protein sequences from complete genomes. This new method is conceptually simpler than and computationally as fast as the one proposed by Qi et al. (2004b) and Chu et al. (2004). The same data sets used in Qi et al. (2004b) and Chu et al. (2004) are analyzed using the new method. Our distance-based phylogenic tree of the 109 prokaryotes and eukaryotes agrees with the biologists tree of life based on 16S rRNA comparison in a predominant majority of basic branching and most lower taxa. Our phylogenetic analysis also shows that the chloroplast genomes are separated to two major clades corresponding to chlorophytes s.l. and rhodophytes s.l. The interrelationships among the chloroplasts are largely in agreement with the current understanding on chloroplast evolution

    CandidaDB: a genome database for Candida albicans pathogenomics

    Get PDF
    CandidaDB is a database dedicated to the genome of the most prevalent systemic fungal pathogen of humans, Candida albicans. CandidaDB is based on an annotation of the Stanford Genome Technology Center C.albicans genome sequence data by the European Galar Fungail Consortium. CandidaDB Release 2.0 (June 2004) contains information pertaining to Assembly 19 of the genome of C.albicans strain SC5314. The current release contains 6244 annotated entries corresponding to 130 tRNA genes and 5917 protein-coding genes. For these, it provides tentative functional assignments along with numerous pre-run analyses that can assist the researcher in the evaluation of gene function for the purpose of specific or large-scale analysis. CandidaDB is based on GenoList, a generic relational data schema and a World Wide Web interface that has been adapted to the handling of eukaryotic genomes. The interface allows users to browse easily through genome data and retrieve information. CandidaDB also provides more elaborate tools, such as pattern searching, that are tightly connected to the overall browsing system. As the C.albicans genome is diploid and still incompletely assembled, CandidaDB provides tools to browse the genome by individual supercontigs and to examine information about allelic sequences obtained from complementary contigs. CandidaDB is accessible at http://genolist.pasteur.fr/CandidaDB

    Beyond representing orthology relations by trees

    Get PDF
    Reconstructing the evolutionary past of a family of genes is an important aspect of many genomic studies. To help with this, simple relations on a set of sequences called orthology relations may be employed. In addition to being interesting from a practical point of view they are also attractive from a theoretical perspective in that e.\,g.\,a characterization is known for when such a relation is representable by a certain type of phylogenetic tree. For an orthology relation inferred from real biological data it is however generally too much to hope for that it satisfies that characterization. Rather than trying to correct the data in some way or another which has its own drawbacks, as an alternative, we propose to represent an orthology relation δ\delta in terms of a structure more general than a phylogenetic tree called a phylogenetic network. To compute such a network in the form of a level-1 representation for δ\delta, we formalize an orthology relation in terms of the novel concept of a symbolic 3- dissimilarity which is motivated by the biological concept of a ``cluster of orthologous groups'', or COG for short. For such maps which assign symbols rather that real values to elements, we introduce the novel {\sc Network-Popping} algorithm which has several attractive properties. In addition, we characterize an orthology relation δ\delta on some set XX that has a level-1 representation in terms of eight natural properties for δ\delta as well as in terms of level-1 representations of orthology relations on certain subsets of XX

    Natural History, Microbes and Sequences: Shouldn't We Look Back Again to Organisms?

    Get PDF
    The discussion on the existence of prokaryotic species is reviewed. The demonstration that several different mechanisms of genetic exchange and recombination exist has led some to a radical rejection of the possibility of bacterial species and, in general, the applicability of traditional classification categories to the prokaryotic domains. However, in spite of intense gene traffic, prokaryotic groups are not continuously variable but form discrete clusters of phenotypically coherent, well-defined, diagnosable groups of individual organisms. Molecularization of life sciences has led to biased approaches to the issue of the origins of biodiversity, which has resulted in the increasingly extended tendency to emphasize genes and sequences and not give proper attention to organismal biology. As argued here, molecular and organismal approaches that should be seen as complementary and not opposed views of biology

    A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The most common substitution matrices currently used (BLOSUM and PAM) are based on protein sequences with average amino acid distributions, thus they do not represent a fully accurate substitution model for proteins characterized by a biased amino acid composition. This problem has been addressed recently by adjusting existing matrices, however, to date, no empirical approach has been taken to build matrices which offer a substitution model for comparing proteins sharing an amino acid compositional bias. Here, we present a novel procedure to construct series of symmetrical substitution matrices to align proteins from similarly biased <it>Plasmodium </it>proteomes.</p> <p>Results</p> <p>We generated substitution matrices by selecting from the BLOCKS database those multiple alignments with a compositional bias similar to that of <it>P. falciparum </it>and <it>P. yoelii </it>proteins. A novel 'fuzzy' clustering method was adopted to group sequences within these alignments, showing that this method retains more complete information on the amino acid substitutions when compared to hierarchical clustering. We assessed the performance against the BLOSUM62 series and showed that the usage of our matrices results in an improvement in the performance of BLAST database searches, greatly reducing the number of false positive hits. We then demonstrated applications of the use of novel matrices to improve the annotation of homologs between the two <it>Plasmodium </it>species and to classify members of the <it>P. falciparum </it>RIFIN/STEVOR family.</p> <p>Conclusion</p> <p>We confirmed that in the case of compositionally biased proteins, standard BLOSUM matrices are not suited for optimal alignments, and specific substitution matrices are required. In addition, we showed that the usage of these matrices leads to a reduction of false positive hits, facilitating the automatic annotation process.</p

    A Combination of Compositional Index and Genetic Algorithm for Predicting Transmembrane Helical Segments

    Get PDF
    Transmembrane helix (TMH) topology prediction is becoming a focal problem in bioinformatics because the structure of TM proteins is difficult to determine using experimental methods. Therefore, methods that can computationally predict the topology of helical membrane proteins are highly desirable. In this paper we introduce TMHindex, a method for detecting TMH segments using only the amino acid sequence information. Each amino acid in a protein sequence is represented by a Compositional Index, which is deduced from a combination of the difference in amino acid occurrences in TMH and non-TMH segments in training protein sequences and the amino acid composition information. Furthermore, a genetic algorithm was employed to find the optimal threshold value for the separation of TMH segments from non-TMH segments. The method successfully predicted 376 out of the 378 TMH segments in a dataset consisting of 70 test protein sequences. The sensitivity and specificity for classifying each amino acid in every protein sequence in the dataset was 0.901 and 0.865, respectively. To assess the generality of TMHindex, we also tested the approach on another standard 73-protein 3D helix dataset. TMHindex correctly predicted 91.8% of proteins based on TM segments. The level of the accuracy achieved using TMHindex in comparison to other recent approaches for predicting the topology of TM proteins is a strong argument in favor of our proposed method. Availability: The datasets, software together with supplementary materials are available at: http://faculty.uaeu.ac.ae/nzaki/TMHindex.htm
    corecore